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ABSTRACT 


As from time to time it is impractical to ask agents to provide linear orders over all alternatives, for these 
partial rankings it is necessary to conduct preference completion. Specifically, the personalized preference 
of each agent over all the alternatives can be estimated with partial rankings from neighboring agents over 
subsets of alternatives. However, since the agents' rankings are nondeterministic, where they may provide 
rankings with noise, it is necessary and important to conduct the certainty-based preference completion. 
Hence, in this paper firstly, for alternative pairs with the obtained ranking set, a bijection has been built from 
the ranking space to the preference space, and the certainty and conflict of alternative pairs have been 
evaluated with a well-built statistical measurement Probability-Certainty Density Function on subjective 
probability, respectively. Then, a certainty-based voting algorithm based on certainty and conflict has been 
taken to conduct the certainty-based preference completion. Moreover, the properties of the proposed 
certainty and conflict have been studied empirically, and the proposed approach on certainty-based 
preference completion for partial rankings has been experimentally validated compared to state-of-arts 
approaches with several datasets. 


1. INTRODUCTION 


In a preference completion problem, with a set of agents (users) and a set of alternatives (items), each 
agent (user) has his/her partial ranking over a subset of alternatives (items) and the goal of this problem is 
to infer each agent (user)’s personalized ranking or preference over all the alternatives (items) including 
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those alternatives (items) the agent (user) has not yet handled. Obviously, from time to time it is impractical 
to ask agents to provide linear orders over all alternatives, especially in big data environments [1]. For 
example, perhaps the agent does not know the status of some alternatives because there are too many 
alternatives, which makes it hard for the agent to rank all of them. Or perhaps some alternatives are 
incomparable for a certain agent. All these situations mentioned above result in partial rankings, and it is 
necessary to introduce preference completion. 


The preference completion problem has been applied to applications in many areas, such as social 
choice, and recommender system [2], which can be very useful in community detection [3, 4], or graph 
anomaly detection [5]. For example, in social choice, each voter (agent) can cast a ballot by a ranking over 
all candidates (alternatives), or a partial ranking over some candidates (alternatives). As for these partial 
rankings, it is necessary to form a ranking over all candidates by a certain voting rule. In a recommendation 
system, each user can rate some items. Then the task of the recommendation system is to predict the rate 
on the items that have not been rated by him/her. To satisfy this requirement, two common approaches 
including the matrix factorization approach and the neighborhood-based approach are introduced to 
handle the preference completion. The traditional algorithms on these two approaches are usually rating- 
oriented, while a recent line of work focuses on the ranking-oriented algorithms [6, 7] due to the drawbacks 
of the rating-oriented algorithms. In this paper, we focus on the ranking-oriented neighborhood-based 
approach. 


Traditionally, in neighborhood-based preference completion, it is first to find the near neighbors of each 
agent and then aggregate these neighbors’ rankings to produce the predicted preference by a certain voting 
rule [6]. However, this task has some inevitable issues. For example, an agent may exhibit irrational 
behaviors or provide rankings in a noise setting. To address this issue, many rating-oriented trust-based 
approaches have been proposed with additional contextual information. Meanwhile, the ranking-oriented 
approach has left much room for better research. Liu et al. [8] proposed an anchor-based algorithm with 
many other agents’ ranking information leveraged to ignore the presence of randomness. 


Here in this paper a certainty-based preference completion algorithm is proposed on the basis of Liu’s [8] 
work. More precisely, after finding the k-nearest neighbors by the anchor-kKNN algorithm Liu proposed, 
we use the certainty-based voting algorithm introduced in this paper to complete the preference (ranking) 
instead of using the traditional majority voting rule. The traditional majority voting rule tends to cause wrong 
judgment especially when both sides have close votes. In this case, a slight randomness even can cause 
different outcomes by the majority voting rule. For this reason, this paper introduces a certainty-based voting 
algorithm to deal with this problem. Importantly, when we take a vote on two alternatives, the certainty 
which measures the degree that the two alternatives can be preferred or comparable should be introduced. 
Only when the certainty value satisfies a defined threshold, we can go further to have three-way preference 
decision instead of assigning O or 1 for the two alternatives simply. Hence, the certainty-based voting 
algorithm avoids the wrong judgment when both sides have close scores or rankings made in a noise setting. 
In this paper, before formulating the certainty and presenting the certainty-based preference completion 
algorithm, we consider the certainty and preference space first to introduce the three-way preference 
between two alternatives. 
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Technically, in a ranking pool gathered from agents, the rankings including alternative pair A and B can 
be aggregated to form the preference between A and B. Mathematically, a bijection can be built from the 
ranking space to the preference space for alternative pair A and B. Here, the ranking space consists of all 
the partial rankings on A and B from agents, while the preference space consists of three-way preference 
between A and B, which includes 


e preference (prefer A to B, denoted as P%,), 
e dispreference (prefer B to A, denoted as P,,), and 
e uncertainty (no preference between A and B, denoted as C;,), 


according to the trisecting and acting models of human cognitive behaviors [1, 9]. Thus, the following three 
situations are distinguished: 


e The agents prefer alternative A to alternative B, which can be confirmed by high preference Pis, low 
dispreference P,,, and low uncertainty Cp- 

e The agents prefer alternative B to alternative A, which can be confirmed by low P,,, high Pip, and 
low Cip- 

e The agents are uncertain about the preference between alternative pair A and B, i.e., A and B are 
unpreferred, which can be confirmed by low Pip, low Pip, and high Cip- 


It is obvious that when C}; is low, the preference between A and B can be determined, i.e., A and B 
are preferable. Hence, the certainty of preference can be introduced to describe the trustworthiness of the 
preference, which is denoted as C;,, and it can be calculated as C;, =1—C,,. The certainty of preference 
can be taken as the subjective probability of the preference, following the proposition that the certainty is 
the degree of belief that an individual has on the preference [10]. Hence, in this paper, the certainty can 
be evaluated based on a well-built statistical measurement, which defines a bijection from ranking space 
to preference space, enabling the estimation on the pairwise preference with neighbors’ partial rankings 
via mapping them to 


(preference Pi, dispreference P,,, uncertainty C;,). 
Our definition on certainty should capture the following key properties: 


— Property 1: Certainty Cj, increases as the number of rankings between alternative pair A and B 
increases for a fixed ratio of rankings from A to B and rankings from B to A. 

— Property 2: Certainty Cj, decreases as the extent of conflict increases in the partial rankings between 
alternative pair A and B. 
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Our main contributions in this paper can be summarized as follows: 


e As pointed out in [11], it is necessary and important to introduce the certainty and conflict of the 
preference between alternative pairs, and from time to time the certainty and conflict of the preference 
are more important than the preference itself. In this paper, a probability-based certainty and conflict 
are introduced under Properties 1 & 2, to describe the trustworthiness of the preference. 

e A certainty-based voting algorithm using the certainty and conflict is proposed for conducting the 
certainty-based preference completion in nondeterministic settings. 

e We empirically study the properties of the proposed approach, and experimentally validate the 
proposed approach compared to the state-of-the-art approaches with several datasets. 


This paper is organized as follows. Section 2 reviews existing works on the Plackett-Luce model, Kendall- 
Tau distance and anchor-kNN algorithm. In Section 3, a bijection has been built from ranking space to 
preference space, and certainty and conflict of alternative pairs have been evaluated based on a well-built 
statistical measurement. In Section 4, a certainty-based voting algorithm has been taken to conduct the 
preference completion with the certainty and conflict. In addition, Section 5 studies empirically the 
properties of the proposed approach about certainty and conflict. Moreover, Section 6 has been 
experimentally validated compared to the state-of-the-art approaches with several datasets. Finally, Section 
7 summarizes this paper and presents the future work. 


2. BACKGROUND 
2.1 Plackett-Luce Model 


Given a set of m alternatives and a set of n agents, let y(y,, Yz .--, Ym) denotes the latent features of 
alternatives and x(x;, Xə, ..., X,) denotes the latent features of agents. Agent i's ranking R; is determined by a 
statistical model for ranking data. Hence, as a widely-used statistical model, the Plackett-Luce model [12, 13] 
is adopted to generate the rankings of agents. In this paper, each alternative is assigned a positive value 
named utility. The greater this utility is, the more likely its corresponding alternative is ranked at a higher 
position [14]. In [14], the realized utility for every alternative j on agent i is determined by 


U(X, y) = A(x, Y) + Ej (1) 


where (x, y;) is agent i’s expected utility on alternative j and can be determined by the closeness of the 
latent feature x, and y, measured by (x, y) = exp(-||x;— y;||2), and ¢;; is a zero mean independent random 
variable that follows a Gumbel distribution. When the realized utilities set u,Uuj, Up, .--, Um) Of agent i is 
obtained, agent į ranks the alternatives in a decreasing order according to the realized utilities. After 
repeating this for n times, synthetic datasets of all the agents can be generated for experiments. For more 
details, please refer to the following Algorithm 1. 
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Algorithm 1. Sampling from Plackett-Luce Model. 


Input: A latent feature set x(21,22,...,0%n) on n agents; a latent feature set y(yi. Y2, .... Ym) on m alterna- 
tives. 

Output: A dataset of rankings R{ Ri, Ro,---, Rn} 

1: for zi in T(T1, T2, ..., En) do 

2 for y; in y(y1, Y2, ---, Ym) do 

3: Sample ¢;,; follow a Gumbel distribution 

4 Compute ui; (zi, yj) = exp(—||z: — yjll2) + Eij 

5 end for 

6: end for 

7: for x; in 2(71,22,...,2n) do 
8 for j= 1 tom do 


9: Choose an alternative y; from y with a probability proportional to tij. 
10: Ri + Ri > yj and y + y\{y;} 

11: end for 

12: end for 


13: return R{ R1, R2,---, Rn} 


2.2 Kendall-Tau Distance 


Given two agents’ rankings R, and R, over the same alternatives, the Kendall-Tau distance can be 
introduced to measure the similarity of R, and R,, which is the total number of disagreements in pairwise 
comparisons between alternatives in the linear rankings. For alternative j in R, Rj) represents the position 
in R;. For example, with a ranking of alternatives represented by R, if j in R, is the top-ranked alternative, 
then R{j) = 1. The normalized Kendall-Tau distance between R, and R, is 


> I(T, RG) = RG) < 0) 


= iy#in ER 2 
NK(R,,R,) A (2) 
2 


where /(v) is an indicator that is set to be 1 if the argument v is true; otherwise, it is set to be 0. 


Moreover, if the rankings have not shared completely the same alternatives, the intersection of the two 
alternative sets can be taken for computing the normalized Kendall-Tau distance. 


2.3 Anchor-kNN Algorithm 


Before the introduction of the anchor-kKNN proposed in [8], we first present the idea of KT-KNN, which 
simply uses the Kendall-Tau distance to find the agent’s neighbors. If the Kendall-Tau distance between two 
rankings R; and R; is small, the latent feature of the agents x; and x; should be close, i.e., the two agents 
have a similar opinion on alternatives. 


As the KT-kKNN algorithm has not considered that agents’ preferences may be nondeterministic or agents’ 
rankings are made in noise setting, different from KT-KNN, anchor-kNN uses other agents’ (named as 
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anchors) ranking data to determine the closeness of two agents rather than considering the two agents’ 
rankings only. The anchor-kNN develops a feature F, ; for agents i and j to represent the Kendall-Tau distance 
between R; and R, i.e., F; ; = NK(R, R). Then for measuring the closeness of two agents denoted as D; w 
use the sum of the d oe between F,, and F, to find the k-nearest neighbors, where t is the third a 
that belongs to all the other agents except anne i andj. 


3. CERTAINTY AND PREFERENCE SPACE 


In this section, let us present some preliminary definitions first. For an arbitrary alternatives pair A and 
B, the certainty can be adopted to describe the trustworthiness of the preference between A and B. 
Technically, following [15], a Probability-Certainty Density Function (PCDF) can be introduced to capture 
the subjective probability of the ranking. However, unlike [15], following [16] and [17], in this paper 
certainty is defined based on the PCDF to satisfy Properties 1 & 2. 


3.1 Ranking Space 


The ranking space consists of all the weighted partial rankings on the alternative pair A and B from 
agents, including 


e the rankings {O{}} where A is ranked ahead of B with weight w4} for the ranking O{}, and nas 
denotes the accumulated weight of rankings {O{}}, represented by n,, = >: wi), 

e the rankings {0%} where B is ranked ahead of A with weight w9?) for the ranking OY}, and ng, denotes 
the accumulated weight of rankings {Oj/!}, represented by np, = pe and 

e the unordered ones {OŚ} where A and B are not comparable with weight w for the ranking 
ol 


, and nz, denotes the accumulated weight of rankings O% 


AB! 


represented By n=} w8. 


Z (k) 
Obviously, we have w =w, and o% = = OF 


Moreover, the weight w\;, for OÑ}, means the quality of ranking OÑ}. Without additional knowledge, 
we assign w\} to be 1. 


DEFINITION 1. Ranking space 


O = {< Nag, Nga Nag >| MINN ag, Nga Nag} > O}. 


3.2 Preference Space 


Traditionally, the uncertainty is usually ignored, and sometimes dispreference has not been taken into 
account as well, which leads to some disturbing results shown in empirical study section. According to the 
trisecting and acting models of human cognitive behaviors [9, 18], the preference space consists of three- 
way preference between alternatives, which includes 
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e preference Pý, (prefer A to B), 
e dispreference P;, (prefer B to A), and 
e uncertainty C;, (no preference between A and B). 


DEFINITION 2. Preference space 


P={< Pip, Pig, Cag >| Pig + Pag + Cag = 1,min{Py,,Pi,,Cagt > O}. 


3.3 Certainty of Rankings in Alternative Pairs 


The Bayesian inference [19, 20] here is adopted to update the probability with the available contextual 
information about the rankings in alternative pairs, i.e., update the prior distribution to the posterior 
distribution [21, 22]. Currently, the offline Bayesian inference has been utilized in this paper. The Bayesian 
inference can also be applied to online/streaming scenario [23, 24]. 


Let X49, Xs, and x, be the probability of rankings {0%}, {O9} and {0%}, respectively, where 
X5 = 1— Xas — Xgq and X = <X4s, Xga>. In addition, x4, € [0,1], Xs, € 10,1] and x, 2 0, and thus we then have 
Xag + Xega <1. 


Without any additional information, the prior distribution f(X|O) is a uniform distribution. As the 
cumulative probability of a distribution within [0,1] equals 1, the density of a PCDF has the mean value 1 
within [0,1], and this makes f(X|O) = 1. 


As the ranking sample O conforms to a multinomial distribution [16, 22], we have 


NAB 
z5) (3) 


—)! 
za)! 


6lX4g) ^ (Xp) PA (x 


NaglNg,l(n 


f(O) = 


As for posterior distribution f{O|X), it can be estimated as [16, 22]: 


"AB 


flO X) =P ATONO) a Coy) a (4) 
[FOL OO)AX fa) (gq) (qq) 78 dX 


B 


Then, the certainty can be determined by the deviations of posterior distribution from the prior distribution, 
i.e., uniform distribution. Hence, we have the following definition about certainty. 


DEFINITION 3. The certainty Cj, of rankings {< nag, Nga, Nz >} can be estimated as 


(Xan) (Xp) A (Xa) 48 


AB 1|dX (5) 


n— 


Ji” (Xaa) P^ (x) 48 dX 


1 1r 
Ci =h IOW- lax == [ 


AB 


where J is to remove the double counting of the deviations. 


From this definition, we have Ci, = Cha. 
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3.4 Conflict of Rankings in Alternative Pairs 


The conflict can be determined by the relative difference between weighted rankings nas and nga, as in 
[17]. More specifically, 


e there is the largest conflict, when weighted rankings nag = Nga; 
e there is the smallest conflict, when weighted rankings nag = O or ng, = O. 


Hence, we have the following definition about conflict. 


DEFINITION 4. The conflict Cag of rankings {< nag, Nga, Nz >} can be estimated as 


cu = min} Pas Paa l (6) 


, 
Dag + Nea Nap + Nga 


From this definition, we have Cag = Cga. 


3.5 Bijection from Ranking Space to Preference Space 


With Definitions 1, 2, 3 and 4, the following definition can be introduced. 


DEFINITION 5. The bijection from ranking space {< Nig, Nga, Nag >} to preference space {< Pis, Pag, Cas >} 
can be estimated as 
n 
pia A oe (7) 


AB 
Nag t Nga + a 


= n 
po __ci, (8) 
Nag t Nga + Nz5 


Ca =1-C} (9) 


4. CERTAINTY-BASED PREFERENCE COMPLETION 


This section proposes the certainty-based preference completion approach. The framework of our 
approach is shown in Figure 1. It includes two processes. One is to find the k-nearest neighbors for user i 
with the anchor-kNN algorithm Liu [8] proposed. The other one is to conduct a linear ranking for user i 
over all alternatives. In this section, we focus on the latter one. As for the latter one, with the neighbors’ 
partial ranking, a certainty-based voting algorithm is introduced to estimate pairwise preference for all pair 
alternatives, and then these pairwise preferences can form a linear ranking for the user i. 
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Step1: Find k-Nearest Neighbors | Step2: Certainty-based voting algorithm 
O:How to form a linear , P Preserence spe space s medel. | 
ranking from partial | Neighbor 1: A>B>D>C . Ranking space 
anking? 
| Neighbor 2: B>C>D>F | | 
Anchor-KNN . — — 
2 | Neighbor 3: F>A>D>C | | “Preference space a 
0 | Find k-nearest user l Certainty | | 
1 . | 
| L _Conflict 


| Neighbor k: H>F>D>A 
= =) ae 
= ee ee 
E>F>B>A zA — + mf + am * 
eo distance distance distance | 1 | 


Estimate pairwise preference forall 4— — — 


@ : | pair alternatives 
e C User k1 User k2 User kn Forma J ranking 


Figure 1. Certainty-based preference completion process. 


4.1 Certainty-based Voting Algorithm 


First, let us introduce a definition. 


DEFINITION 6. With preference space {< Pip, Pis, Cag >}, the following conclusions can be obtained: 


e if uncertainty C;, 2 &, alternatives A and B are unpreferred; 
e if Ci, <&, 

- if Pi, — Pj, 2 &, user i prefers A to B; 

- if Pig — Pis 2 &, user i prefers B to A; 

- otherwise, A and B are unpreferred; 


where «, and e, are thresholds to rule out the fuzziness of comparison. 


In the existing work, with the rankings of neighbors obtained by k-nearest neighbors algorithm, common 
voting rules®, such as majority voting, can be taken to estimate pairwise preference for conducting the 
preference completion. 


® common voting rules may include positional scoring rules, maximin, and Bucklin. For more details, please refer to [21]. 
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In contrast, in this paper, we use a certainty-based voting rule with certainty and conflict to obtain 
pairwise preference. The certainty and conflict measure the trustworthiness that the pair alternatives can be 
preferred or comparable. If the certainty satisfies a defined threshold, we can then evaluate the degree that 
the user / prefers one to another denoted by P;, and Pip. Then, only if the difference between the two-way 
preference has reached a value, we can make a preference decision on the two alternatives. Technically, 
for the alternative pair A and B with C}; < & and Pi - Pal 2 &, a preference decision between A and B 
can be made. The process for estimating pairwise preference is also shown in Algorithm 2. We apply this 
algorithm on all alternative pairs, and then we get all the pairwise preferences. 


Algorithm 2. Certainty-based voting algorithm for estimating pairwise preference. 


Input: A pair of alternatives A and B, neighbors rankings R1, R2, ..., Rk 
Output: pairwise preference for alternative A and B denoted by yap. 


Le for Ri in {Ri, R2,- Re} do 

2 if A in R; and B in R; then 

3: nap + nas + lor nga + npea +1, if A is ranked before or after B 
4 else 

5: nap + nap + l, 

6: end if 

7: end for 


8; Compute Cy, and Pg, Pag using nag, nga and ngg With Equation (5), Equation (7), Equation (8) and Equation (9). 
9: if then 


11: else 

12: 

13: end if 

14: if Cy, > «1 then 

15: wap =0 

16: end if 

17: if Cay <1 then 

18: if Pis — Pag > €2 then 
19: Wap=l 

20; elseif Py, — Pi, > €2 then 
21; Wap = —l1 

22: else 

23: Wap =0 

24: end if 

25: end if 


Y 
D 


: return Wap 
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4.2 Greedy Order Algorithm 


Next, let us combine all the pairwise preferences to form a linear ranking over all alternatives. One 
possible approach is the greedy order algorithm [25]. This algorithm follows a greedy idea: the order 
algorithm always picks the alternative that currently has the maximum potential value in the alternatives 
pool / and ranks it above all the other remaining items. Here, for item i, the potential value v, is equal to 
eis -F Wir This value aggregates all the pairwise preferences obtained in the previous subsection 
and represents the preference for item į among all the neighbors’ rankings. Then it deletes the picked one 
from the alternatives pool and updates the potential values of the remaining items by removing the effects 
of the picked one. Repeat the picking process until the alternatives pool is empty, and then a linear ranking 
for user į is produced. See Algorithm 3. 


Algorithm 3. Greedy order algorithm. 


Input: An alternative set Y, neighbors rankings Ri, R2,..., Rk, all pairwise preferences set 
Ppi., W135 +++) Ym—1,m ) 
Output: A complete ranking R for target user i 
1: /*compute the potential value for every alternative*/ 
2: for alli € Y do 
3: i= Vier Wij — Des Wj,is 
4: end for 
: while J is not empty do 


or 


6: t = argmaxyervt 

7 R(t) =(I| 

8: [=I-t 

9: for all i € Y do 

10: Vi = Vi + Yt, i — Wie 
11: end for 


12: end while 
13: return R 


5. EMPIRICAL STUDIES ON PROPERTIES OF CERTAINTY 


In this section, we study the properties of certainty and conflict in our proposed model. 


5.1 Increasing Rankings with Fixed Conflict 


Figure 2 plots how certainty Ci, varies with weighted rankings nas and n- under fixed conflict Cag. 
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0.70 +. Certainty 


0.65 


0.50 
0.45 


10 20 30 40 50 60 70 80 
The total number of weighted rankings 


A PEPE” . n RT 
Figure 2. Certainty increases with nas + Ng, when ——“8 — and na is fixed. 
Nag + Nga 


This should confirm Property 1. 


i Nas r k ‘ 
THEOREM 1. As for fixed makh and ny the certainty Cj, increases with nag + Nga. 
AB BA 
Nas 
Proof: Let ———— = @, Nas + Nga = B, and 
Nag + Nea 

n n; "TR 

f(e) = (Xap) ^B (Xe) d= XaB T Xpa) i 


1 P j pam 
ET ü (Xpa) (= Xap T Xpa) ^8 dx 


Then we have 


pe 
Civ = 5], fle) - 1 dx 


As in [17], x X, X, X4 can be defined, such that fx) = f(x.) = Ax) = x4) = 1 and 


Ci, = lai [F(e) — 1]dx1sdxg4 
is age 
where x,, X, X, and x, are functions of f. Then 


dCs 
op 


ax, 
op 


F l N [f(e) - Idx pdx, 


d 2(%4 x4 x O p 
- a [f(x,) — 1]dx4s — I. [f(x,) — Idx 45 oer : [F(e) = Idx .,dXp, 
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where 
ah [F(e) — dx ap = Saffi x4) — joes )-1]+ i f(e) — Idx 4, 
(14) 
= pe X dx 44 
Following Lemma 9 in [17], we have 
J ox 
ahs, [f(¢) — Idx, > 0 (15) 
With Equation (13), we have 
OC ip 
(0) 16 
T > (16) 


This confirms the results of Theorem 1. 


5.2 Increasing Conflict with Fixed Rankings 


Figure 3 plots how certainty Cj, varies with weighted rankings nag and n- under the fixed summation 
of Nag + Nga and the fixed n. This should confirm Property 2. 


0.62 


0.60 


Certainty 
a 
u 
œ 
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ùn 
D 


0.54 


0.52 + Certainty 
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Figure 3. Certainty is concave when n,, + nga + Nn- and nz is fixed, and the minimum occurs at Nag = Nga. 
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THEOREM 2. As for fixed n- the certainty Cj, is decreasing with nag < gq, and increasing with Nag 2 Nga. 


Proof: The details of validation process can be omitted here, as it is similar to one in the proof of 
Theorem 1. More specifically, with removing the absolute sign and then differentiating it, it can be proved 
that the derivation is negative for nag < Nga, and positive for Nas = Nga. 


6. EXPERIMENTS 


In this section, we examine the empirical performance of the certainty-based preference completion 
algorithm. In the experiments, we compare our certainty-based preference completion algorithm with the 
common majority voting algorithm [8] and the classic collaborative filtering algorithm (CF) [26]. Both our 
certainty-based preference completion algorithm and majority voting algorithm use the anchor-kNN 
algorithm to find k-nearest neighbors’ rankings and utilize these rankings to conduct the preference 
completion of the target user. While the collaborative filtering algorithm is a rating-oriented algorithm 
different from the other two. It computes user’s similarity to find user’s neighbors, and uses their ratings to 
generate item prediction. 


6.1 Datasets 


The experiments adopt two forms of datasets to evaluate algorithms’ performance. 


e One type of dataset is the synthetic one created by the sampler using a Plackett-Luce model with 
Algorithm 1. The produced synthetic dataset has over 20,000 rankings from agents on the set of 20 
alternatives. Each ranking follows a Gumbel distribution. 

e The other type of dataset is the Flixster dataset that collects the movie ratings by users with social 
trust. It has over 8,000,000 ratings on over 2,000 movies. For the experiments, we convert the ratings 
to rankings, and select over 9,000 rankings on over 50 movies. 


6.2 Evaluation Metrics 


We evaluate the performance on three metrics: (a) Prediction error, (b) Spearman correlation coefficient, 
(c) Kendall rank correlation coefficient. The first one measures the quality of the predicted ranking, and the 
others measure the degree of correlation on the predicted ranking with the original one. Please refer to 
Pearson [27] and Liu et. al. [2] for more details. 


e Evaluation Metric 1: This evaluation metric estimates the accuracy on the predicted ranking with 
the original true one. 
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where M is the maximum of the pairwise error, Y,,, = 1 means in predicted ranking, alternative user 
i prefers alternative j to alternative k and X,,, = 1 represents alternative user i prefers alternative j to 
alternative k in original ranking. F(v) equals 1 when v < 0, and equals O otherwise. 

Evaluation Metric 2: The Spearman correlation coefficient measures the difference of the position 
for every alternative in predicted ranking and the original one to evaluate the similarity between the 
predicted ranking and the original one. The greater its value, the more precise our predicted ranking. 
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where d, represents the difference on the position of alternative į with the predicted ranking and the 
original one. 
Evaluation Metric 3: The Kendall rank correlation coefficient is very similar to the above evaluation 
Metric 2, except that it uses the Kendall distance to measure the correlation: 
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where the symbol in Equation (20) has the same meaning with the evaluation Metric 1, /, represents 
the alternatives set in original ranking, and /, represents the alternatives set in predicted ranking. 


6.3 Experimental Results on Synthetic Dataset and Flixster Dataset 


In this section, we conduct the experiments on a synthetic dataset and the Flixster dataset. With the 


evaluation metrics separately, the comparison results with different approaches can be presented. The 


prediction error measures the difference in pairwise preference with the predicted ranking and original 


ranking. The goal is to reduce the prediction error as far as possible. While the Spearman correlation 


coefficient and the Kendall rank correlation coefficient measure the similarity between the predicted ranking 


and the original ranking. We expect the values on these two evaluation metrics can be higher possibly. 


(a) Synthetic dataset 


NO 


As shown in Figure 4, it is very clear that the prediction error tends to be smaller when using certainty- 
based algorithm than the CF algorithm and the majority voting algorithm. In addition, the two ranking- 
oriented approaches outperform the rating-oriented approach. For one thing, the ranking contains 
more preference relation information over alternatives than rating score, and thus it may be easier 
and more accurate in finding the user’s neighbors and completing preference. As a result, the 
ranking-oriented approach has a lower prediction error. For another, the comparison between the 
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certainty-based voting algorithm and the majority voting algorithm shows the superiority of the 
certainty-based one. The preference completion algorithm with certainty considered does reduce 
the effect of randomness. 

e Figure 5(a) shows the performance of Spearman correlation coefficient. On this evaluation metric, 
the certainty-based voting algorithm performs better than the other two algorithms. This is because 
our approach with preference space and certainty considered can filter out those pair preferences 
which have close votes and have lower certainty. This behavior causes the predicted rank much more 
trustworthy. 

e Figure 5(b) shows the performance of Kendall rank correlation coefficient. We can get a similar 
conclusion with the Spearman correlation coefficient in Figure 5(a), so we do not repeat explanation 
here. 


Roughly speaking, from the experiments on the synthetic dataset, we verify the effectiveness of our 
proposed certainty-based preference completion algorithm. 
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Figure 4. Prediction error on synthetic dataset: x-axis denotes the number of neighbors. Plots show the prediction 
error. For this evaluation metric, smaller values are better. 
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Figure 5. Performance on synthetic dataset: x-axis denotes the number of neighbors. Plots show the Spearman 
correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation 
metrics, higher values are better. 


(b) Flixster dataset The performance of the three approaches is examined on a real-world dataset, 
Flixster dataset, which contains the rating information. Because the proposed algorithm and the majority 
voting algorithm both use the anchor-kNN algorithm which need ranking data instead of rating data, we 
need to convert rating data to ranking data first. 


e As shown in Figure 6, when the number of neighbors, k > 300, our approach outperforms the other 
two and the ranking-oriented method still performs better than the rating-oriented method. While 
when k < 300, the result does not perform as expected. A possible reason may be that the process 
of converting ranking data to rating data inevitably brings errors on the pairwise preference. With 
more neighbors considered, our proposed algorithm shows its superiority. Thus, the prediction error 
descends when the number of neighbors grows. 

e In Figure 7(a), as we can observe, the certainty-based approach outperforms the other two approaches 
significantly. This shows a consistent result with the experiments on the synthetic dataset. 

e Figure 7(b) shows the a similar performance with Figure 7(a). 


In general, with the experiments on the synthetic dataset and Flixster dataset, we can come to a conclusion 
that the experiments validate our proposed certainty-based preference completion algorithm. 
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Figure 6. Prediction error on Flixster dataset: x-axis denotes the number of neighbors. Plots show the prediction 
error. For this evaluation metric, smaller values are better. 
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Figure 7. Performance on Flixster dataset: x-axis denotes the number of neighbors. Plots show the Spearman 
correlation coefficient (Spearman CC) and Kendall rank correlation coefficient (Kendall CC). For both evaluation 
metrics, higher values are better. 
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7. CONCLUSION AND FUTURE WORK 


Due to the fact that the agents’ rankings are nondeterministic, where they may provide their rankings 
under noisy environments, it is necessary and important to conduct the certainty-based preference 
completion. Hence, in this paper firstly, for alternative pairs a bijection has been built from the ranking 
space to the preference space, and its certainty and conflict have been evaluated based on a well-built 
statistical measurement Probability-Certainty Density Function. Then, a certainty-based voting algorithm 
based on the certainty and conflict has been taken to conduct the preference completion. More specifically, 
the ranking with high certainty and low conflict can be obtained with the proposed algorithm to conduct 
the preference completion. Moreover, the properties of the proposed approach about certainty and conflict 
have been studied empirically, and the proposed approach has been experimentally validated compared 
to the state-of-the-art approaches with several datasets. 


As in real applications, the data is usually unbalanced [28], i.e., some alternative pairs have a lot of 
rankings, while others only have a few rankings. In our future work, we will propose algorithms to handle 
unbalanced preference completion both effectively and efficiently. 
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